Abstract: [Objective] This study aims to enhance the detection of fake reviews by improving the model’s ability to learn deep semantic information from text and addressing the problem of data imbalance. [Methods] User behavior and text characteristics of the dataset were analyzed to automatically calculate a cost-sensitive matrix based on inter-class separability, thereby improving the model’s ability to learn from unbalanced data. Additionally, the text encoding ability of BERT was utilized to optimize the model further. [Results] Extensive experiments on the YelpCHI dataset showed that the proposed model outperformed existing advanced methods with an 18% improvement in F1 value and a 12% improvement in AUC value. [Limitations] While the proposed method has achieved promising results, further research is needed to explore its applicability to other domains. [Conclusions] Leveraging user behavior and text features for category separability calculation effectively enhances the performance of the model in detecting fake reviews. The proposed method’s integration of cost-sensitive matrix and BERT’s text encoding ability holds great potential for improving the detection of fake reviews.
|